(:ar-tr->si 

CS-TR-:x^y2 


N000H-9r)-i-0r)21 
March  1998 


Estimating  Relative  Vehicle  Motions 
in  Traffic  Scenes 

Zoran  Duric"-".  Roman  Goidenberc;^.  Ehud  Rivlin^'^. 
Azriel  Rosenfeld^ 

Greater  for  Automation  Research 
rniversity  of  Maryland 
Colle2;e  Park.  MD  20742-3275 

^  ( ’omputcr  Science  Department 
(ieorire  Mason  Dniversitv 
Fairfax.  \'A  22030-4444' 

'MOc'part.meii!  of  (^'JmputfT  Srierua^ 

IVM'linion  --  Israel  iiistituti"  ot  dechnoloity 
Haifa.  Israel  32000 


COMPUTER  VISION  LABORATORY 


1}|^^ 


CENIER  FOR  AUTOM^^ 


UNIVERSITY  OF  MARYLAND 

COLLEGE  PARK,  MARYLAND 

-  :20742-^5^jS::e.--.  >. 


i  jyjlC}  QIJAIjr? 


19980331  048 


CAR-TR-881 

CS-TR-3882 


N00014-95-1-0521 
March  1998 


Estimating  Relative  Vehicle  Motions 
in  TrafRc  Scenes 

Zoran  Duric^’^,  Roman  Goldenberg®,  Ehud  Rivlin^’®, 
Azriel  Rosenfeld^ 

^Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742-3275 

^  Computer  Science  Department 
George  Mason  University 
Fairfax,  VA  22030-4444 

^Department  of  Computer  Science 
Technion  -  Isreiel  Institute  of  Technology 
Haifa,  Israel  32000 


Abstract 

Autonomous  operation  of  a  vehicle  on  a  road  calls  for  understanding  of  various  events  involving 
the  motions  of  the  vehicles  in  its  vicinity.  In  this  paper  we  show  how  a  moving  vehicle  which 
is  carrying  a  camera  can  estimate  the  relative  motions  of  nearby  vehicles.  We  present  a  model 
for  the  motion  of  the  observing  vehicle,  and  show  how  to  “stabilize”  it,  i.e.  to  correct  the  image 
sequence  so  that  transient  motions  resulting  from  bumps,  etc.  are  removed  and  the  sequence 
corresponds  more  closely  to  the  sequence  that  would  have  been  collected  if  the  motion  had  been 
smooth.  We  also  model  the  motions  of  nearby  vehicles  and  show  how  to  detect  their  motions 
relative  to  the  observing  vehicle.  We  present  results  for  several  road  image  sequences  which 
demonstrate  the  effectiveness  of  our  approach. 
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1  Introduction 


Autonomous  operation  of  a  vehicle  on  a  road  calls  for  understanding  of  various  events  involving 
the  motions  of  the  vehicles  in  its  vicinity.  In  normal  trafHc  flow,  most  of  the  vehicles  on  a  road 
move  in  the  same  direction  without  major  changes  in  their  distances  and  relative  speeds.  When 
a  nearby  vehicle  deviates  from  this  norm  (e.g.  when  it  passes  or  changes  lanes),  or  when  it  is 
on  a  collision  course,  some  action  may  need  to  be  taken.  In  this  paper  we  show  how  a  vehicle 
carrying  a  camera  can  estimate  the  relative  motions  of  nearby  vehicles. 

Understanding  the  relative  motions  of  vehicles  requires  modeling  both  the  motion  of  the 
observing  vehicle  and  the  motions  of  the  other  vehicles.  We  represent  the  motions  of  vehicles 
using  a  Darboux  motion  model  that  corresponds  to  the  motion  of  an  object  moving  along  a 
smooth  curve  that  lies  on  a  smooth  surface.  We  show  that  deviations  from  Darboux  motion 
correspond  primarily  to  small,  rapid  rotations  around  the  axes  of  the  vehicle.  These  rotations 
arise  from  the  vehicle’s  suspension  elements  in  response  to  unevenness  of  the  road.  We  estimate 
bounds  on  both  the  smooth  rotations  due  to  Darboux  motion  (from  highway  design  principles) 
and  the  non-smooth  rotations  due  to  the  suspension.  We  show  that  both  types  of  rotational 
motion,  as  well  as  the  non-smooth  translational  component  of  the  motion  (bounce),  are  small 
relative  to  the  smooth  (Darboux)  translational  motion  of  the  vehicle. 

This  analysis  is  used  to  model  the  motions  of  both  the  observer  and  observed  vehicles.  We 
use  the  analysis  to  show  that  only  the  rotational  velocity  components  of  the  observer  vehicle 
axe  important.  On  the  other  hand,  the  rotational  velocity  components  of  an  observed  vehicle 
are  negligible  compared  to  its  translational  velocity.  As  a  consequence  we  need  to  estimate  the 
rotational  velocity  components  only  for  the  observing  vehicle.  This  is  the  case  even  when  an 
observed  vehicle  is  changing  its  direction  of  motion  relative  to  the  observing  vehicle  (turning 
or  changing  lanes);  the  turn  shows  up  as  a  gradual  change  in  the  direction  of  the  relative 
translational  velocity. 

An  important  consequence  of  the  Darboux  motion  model  is  that  for  a  fixed  forward-looking 
camera  mounted  on  the  observer  vehicle  the  direction  of  translation  (and  therefore  the  position 
of  the  focus  of  expansion  (FOE))  remains  the  same  in  the  images  obtained  by  the  camera.  We 
use  this  fact  to  estimate  the  observing  vehicle’s  rotational  velocity  components;  this  is  done  by 
finding  the  rotational  flow  which,  when  subtracted  from  the  observed  flow,  leaves  a  radial  flow 
pattern  (radiating  from  the  FOE)  of  minimal  magnitude. 

We  describe  the  motion  field  using  full  perspective  projection,  estimate  its  rotational  com¬ 
ponents,  and  derotate  the  field.  The  flow  fields  of  nearby  vehicles  are  then,  under  the  Darboux 
motion  model,  pure  translational  fields.  We  analyze  the  motions  of  the  other  vehicles  under  weak 
perspective  projection,  and  derive  their  motion  parameters.  We  present  results  for  several  road 
image  sequences  obtained  from  cameras  carried  by  moving  vehicles.  The  results  demonstrate 
the  effectiveness  of  our  approach. 

In  the  next  section  we  present  motion  models  for  road  vehicles,  discuss  ideal  and  real  vehicle 
motion,  and  analyze  the  relative  sizes  of  the  smooth  and  non-smooth  velocity  components.  In 
Section  3  we  discuss  the  image  motion  and  describe  a  way  to  estimate  the  necessary  derotation. 
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Figure  1:  The  Darboux  frame  moves  along  the  path  F  which  lies  on  the  surface  S. 

Section  4  describes  methods  of  estimating  nearby  vehicle  motions  from  the  normal  flow  field. 
Section  5  presents  experimental  results  for  several  sequences  taken  at  different  locations.  In 
Section  6  we  review  some  prior  work  on  various  topics  that  are  related  to  this  paper,  involving 
road  detection,  vehicle  detection,  and  motion  analysis,  and  compare  them  with  the  approach 
presented  in  this  paper. 


2  Motion  Models  of  Highway  Vehicles 


The  ideal  motion  of  a  ground  vehicle  does  not  have  six  degrees  of  freedom.  If  the  motion  is 
(approximately)  smooth  it  can  be  described  as  motion  along  a  smooth  trajectory  F  lying  on  a 
smooth  surface  E.  Moreover,  we  shall  assume  that  the  axes  of  the  vehicle  (the  fore/aft,  crosswise, 
and  up/down  axes)  are  respectively  parallel  to  the  axes  of  the  Darboux  frame  defined  by  T  and 
E.  These  axes  are  defined  by  the  tangent  t  to  T  (and  E),  the  second  tangent  v  to  E  (orthogonal 
to  t),  and  the  normal  s  to  E  (see  Figrure  1).  Our  assumption  about  the  axes  is  reasonable  for 
the  ordinary  motions  of  standard  types  of  ground  vehicles;  in  particular,  we  are  assuming  that 
the  first  two  vehicle  axes  are  parallel  to  the  surface  and  that  the  vehicle’s  motion  is  parallel  to 
its  fore/aft  axis  (the  vehicle  is  not  skidding). 

Consider  a  point  O  moving  along  a  (space)  curve  F.  There  is  a  natural  coordinate  system 
Otnb  associated  with  F,  defined  by  the  tangent  t,  normal  n,  and  binormal  b  of  F.  The  triple 
(t,  n,  b)  is  called  the  moving  trihedron  or  Frenet-Serret  coordinate  frame.  We  have  the  Frenet- 
Serret  formulas  [22] 

t  =  /cn,  n'  =  —Kt  +  rb,  b  =  — rn  (1) 

where  k  is  the  curvature  and  r  the  torsion  of  F. 

When  the  curve  F  lies  on  a  smooth  surface  E,  it  is  more  appropriate  to  use  the  Darboux 
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frame  (t,  v,  s)  [22].  We  take  the  first  unit  vector  of  the  frame  to  be  the  tangent  t  of  F  and 
the  surface  normal  s  to  be  the  third  frame  vector;  finally  we  obtain  the  second  frame  vector  as 
V  =  s  X  t  (see  Figure  1).  Note  that  t  and  v  lie  in  the  tangent  plane  of  S.  Since  the  vector  t 
belongs  to  both  the  Otnb  and  Otvs  frames,  they  differ  only  by  a  rotation  around  t,  say  through 
an  angle  if  =  We  thus  have 

(costp  sin'll?  \  I  n 
—  sin if?  cost/?  y  \  b 


The  derivatives  of  t,  v,s  with  respect  to  arc  length  along  F  can  be  found  from  (1)  and  (2): 

t  =  KgV  -  KnS,  V  =  -Kgt  +  TgS,  S  =  Knt-  TgV  (3) 

where 

I  •  /  dtl? 

Kg  =  KCOStl?,  Kn  =  KSintl?,  Tg  =  T  + 

as 

Kg  is  called  the  geodesic  curvature,  /c„  is  called  the  normal  curvature,  and  Tg  is  called  the 
(geodesic)  twist. 


It  is  well  known  that  the  instantaneous  motion  of  a  moving  frame  is  determined  by  its 
rotational  velocity  (o  and  the  translational  velocity  T  of  the  reference  point  of  the  frame.  The 
translational  velocity  T  of  0  is  just  t  and  the  rotational  velocity  of  the  Otvs  frame  is  given  by 
the  vector 

Wd  =  Tgt  +  KnV  +  Kgi. 

Hence  the  derivative  of  any  vector  in  the  Otvs  frame  is  given  by  the  vector  product  of  Ud  and 
that  vector.  It  can  be  seen  that  the  rate  of  rotation  around  t  is  just  Tg,  the  rate  of  rotation 
around  v  is  just  Kn,  and  the  rate  of  rotation  around  s  is  just  Kg. 


If,  instead  of  using  the  arc  length  s  as  a  parameter,  the  time  t  is  used,  the  rotational  velocity 
Ci?d  and  translational  velocity  T  are  scaled  by  the  speed  v(t)  =  ds/dt  of  O  along  F. 


2.1  Real  Vehicle  Motion 

We  will  use  two  coordinate  frames  to  describe  vehicle  motion.  The  “real”  vehicle  frame  C^r]( 
(which  moves  non-smoothly,  in  general)  is  defined  by  its  origin  C,  which  is  the  center  of  mass  of 
the  vehicle,  and  its  axes;  (fore/aft),  Crj  (crosswise),  and  CC,  (up/down);  and  the  ideal  vehicle 
frame  Otvs  (the  Darboux  frame),  which  corresponds  to  the  smooth  motion  of  the  vehicle. 

The  motion  of  the  vehicle  can  be  decomposed  into  the  motion  of  the  Otvs  frame  and  the 
motion  of  the  frame  relative  to  the  Otvs  frame.  As  we  have  just  seen,  the  rotational 

velocity  of  the  Otvs  (Darboux)  frame  is  viod  =  v{t^  +  k„v  +  Kgs)  and  its  translational  velocity 
is  vt.  We  denote  the  rotational  velocity  of  the  (vehicle)  frame  by  and  its  translational 
velocity  by  2^. 

The  position  of  the  frame  relative  to  the  Otvs  frame  is  given  by  the  displacement  vector 
dyjd  between  C  and  0,  and  the  relative  orientation  of  the  frames  is  given  by  an  orthogonal 
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rotational  matrix  (matrix  of  direction  cosines)  which  we  denote  by  Ry/d-  The  translational 
velocity  of  the  vehicle  (the  velocity  of  C)  is  the  sum  of  three  terms:  (i)  the  translational  velocity 

of  the  Darboux  frame  ut,  (ii)  the  translational  velocity  Tyjd  =  dy/d-,  and  (iii)  the  displacement 
vCjd  'X-  dyfd  due  to  rotation  of  C  in  the  Otvs  frame.  The  translational  velocity  of  the  vehicle 

expressed  in  the  Otvs  frame  is  thus  void  x  dy/d  +  t’t  +  dy/d',  its  translational  velocity  in  the 
frame  is  . 

ly  —  Ryjdiv^d  X  dy^d  "I"  d"  dy^d)’  (d) 

Similarly,  the  rotational  velocity  of  C(riC  is  the  sum  of  two  terms:  (i)  the  rotational  velocity 
vR'^lj^oJd  of  the  Otvs  frame,  and  (ii)  the  rotational  velocity  Uy/d,  which  corresponds  to  the  skew 
matrix  Vly/d  =  R'^^^Ry/d-  The  rotational  velocity  of  the  C(t](  frame  expressed  in  the  Otvs  frame 
is  thus  vud  +  Rvjddiy/d‘,  the  corresponding  expression  in  the  C^rj(  frame  is 

CVy  =  vRl/dd^d+djy/d.  (5) 

Rotations  around  the  fore/aft,  sideways,  and  up/down  axes  of  a  vehicle  are  called  roll,  pitch, 
and  yaw,  respectively.  In  terms  of  our  choice  of  the  real  vehicle  coordinate  system,  these  are 
rotations  around  the  p,  and  ^  axes. 


2.2  Departures  of  Vehicle  Motion  from  Smoothness 

The  motion  of  a  ground  vehicle  depends  on  many  factors:  the  type  of  intended  motion;  the 
speed  of  the  vehicle;  the  skill  of  the  driver;  the  size,  height  and  weight  of  the  vehicle;  the  type 
and  size  of  the  tires  (or  tractor  treads),  and  the  nature  of  the  suspension  mechanism,  if  any; 
and  the  nature  of  the  surface  on  which  the  vehicle  is  being  driven.  These  factors  tend  to  remain 
constant;  they  undergo  abrupt  changes  only  occasionally,  e.g.  if  a  tire  blows  out,  or  the  vehicle 
suddenly  brakes  or  swerves  to  avoid  an  obstacle,  or  the  type  of  surface  changes.  Such  events 
may  produce  impulsive  changes  in  the  vehicle’s  motion,  but  the  effects  of  these  changes  will 
rapidly  be  damped  out.  In  addition  to  these  occasional  events,  “steady-state”  non-smoothness 
of  a  ground  vehicle’s  motion  may  result  from  roughness  of  the  surface. 

A  ground  vehicle  drives  over  roads  that  have  varying  degrees  of  roughness  [2].  The  roughness 
consists  primarily  of  small  irregularities  in  the  road  surface.  In  discussing  the  effects  of  the 
roughness  of  the  road  on  the  motion  of  a  ground  vehicle  we  will  assume,  for  simplicity,  an 
ordinary,  well  balanced  four-wheeled  vehicle  moving  on  a  planar  surface  that  is  smooth  except 
for  occasional  small  bumps  (protrusions).  The  bumps  are  assumed  to  be  “small”  relative  to  the 
size  of  the  wheels,  so  that  the  effect  of  a  wheel  passing  over  a  bump  is  impulsive.  (We  could  also 
allow  the  surface  to  have  small  depressions,  but  a  large  wheel  cannot  deeply  penetrate  a  small 
depression,  so  the  depressions  have  much  smaller  effects  than  the  bumps.) 

As  the  vehicle  moves  over  road  surface,  each  wheel  hits  bumps  repeatedly.  We  assume  that 
the  vehicle  has  a  suspension  mechanism  which  integrates  and  damps  the  impulsive  effects  of 
the  bumps.  Each  suspension  element  is  modeled  by  a  spring  with  damping;  its  characteristic 
function  is  a  sine  function  multiplied  by  an  exponential  damping  function  (see  [24,  34]). We 
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Figure  2:  A  possible  motion  of  the  base  of  a  vehicle  as  the  vehicle  hits  a  bump. 

assume  that  the  suspension  elements  associated  with  the  four  wheels  are  independent  of  each 
other  and  are  parallel  to  the  vertical  axis  of  the  vehicle. 

The  vehicle  may  hit  new  bumps  while  the  effects  of  the  previous  bumps  are  still  being 
felt.  Each  hit  forces  the  suspension  and  affds  to  the  accumulated  energy  in  the  spring;  thus 
it  can  happen  that  the  suspension  is  constantly  oscillating,  which  has  the  effect  of  moving  the 
corners  of  the  vehicle  up  and  down.  The  period  of  oscillation  is  typically  on  the  order  of  0.5  sec 
(see  [24,  34]).  In  general,  it  takes  several  periods  to  damp  out  the  spring;  for  example,  the 
damping  ratio  provided  by  shock  absorbers  of  passenger  cars  is  in  the  range  0.2  —  0.4.  The 
maximum  velocity  of  the  oscillation  is  typically  on  the  order  of  0.1  m/sec. 

Consider  the  coordinate  system  Cxyz  with  origin  at  the  center  of  mass  C  of  the  vehicle  (see 
Figure  2).  Let  Vi  be  the  velocity  corner  Ci  of  the  vehicle,  and  let  the  length  and  width  of  the 
vehicle  be  L  and  W.  From  the  Vi’s  we  can  compute  the  angular  velocity  matrix 

(0  —u>z  \  (  ^  0  {vs  —  v^)lL  ^ 

ijJz  0  —uix  1  =  0  0  {v2  —  vz)IW  .  (6) 

-IX)y  l^x  0  J  \  {v4-V3)fL  {V3-V2)IW  0  / 

Note  that  any  of  the  u,s  can  be  positive  or  negative.  Multiplication  by  U  can  be  replaced  by 
the  vector  product  with  the  angular  velocity  vector  tD  =  t{v3  —  V2)IW  -f-  —  v^jL  where 

the  rate  of  rotation  around  the  x  axis  (the  roll  velocity)  is  =  (ua  —  V2)IW  and  the  rate  of 
rotation  around  the  y  axis  (the  pitch  velocity)  is  tjOy  =  (us  —  U4)/L.  As  noted  above,  we  typically 
have  |ui|  <  0.1  m/sec.  If  we  assume  that  W  >  \m  and  i  >  2m  we  have  \ijJx\  <  0.2 rad/sec 
and  \ijOy\  <  0.1  rad/sec.  The  yaw  velocity  component  is  \uz\  =  0{vfly/W‘^  +  L^)  which  is 
\u}z\  =  0(0.01)  rad/sec.  (For  a  complete  derivation  see  [15].) 

The  translational  velocity  vector  of  the  center  C  of  the  vehicle  is  obtained  by  using  the 
velocities  Vi  —  va,  V2  —  V3,  0,  and  V4  —  V3  for  the  corners  and  adding  V3  to  the  velocity  in  the 
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direction  of  the  z-axis.  We  thus  have 


f  = 


( 

ty 

\  tz 


-(U4  -  vs)H/L  \ 
{v2-vs)H/W 
{V2  +  V4)/2  ) 


(7) 


If  we  assume  that  H  <  0.5m  (<  W/2)  we  have  |4|  <  0.05 m/sec,  <  0.1  m/sec,  and  |t^|  < 

0.1  m/sec. 

We  can  draw  several  conclusions  from  this  discussion:  (i)  The  effects  of  small  bumps  are  of 
short  duration,  i.e.  they  can  be  considered  to  be  impulsive.  The  suspension  elements  integrate 
and  damp  these  effects,  resulting  in  a  set  of  out-of-phase  oscillatory  motions,  (ii)  The  yaw 
component  of  rotation  due  to  the  effect  of  a  bump  is  very  small  compared  to  the  roll  and 
pitch  components,  (iii)  The  translational  effects  of  a  bump  are  proportional  to  the  velocities 
(or  displacements)  of  the  suspension  elements  and  the  dimensions  of  the  vehicle  and  are  quite 
small. 


2.3  The  Sizes  of  the  Smooth  and  Non-smooth  Velocity  Components 

We  now  compare  the  sizes  of  the  velocity  components  which  are  due  to  the  ideal  motion  of 
the  vehicle  —  i.e.,  the  velocity  components  of  the  Darboux  frame  (Section  2)  —  to  the  sizes 
of  the  velocity  components  which  are  due  to  departures  of  the  vehicle  frame  from  the  Darboux 
frame  (Section  2.2). 

The  translational  velocity  of  the  Darboux  frame  is  just  vt;  thus  the  magnitude  of  the  trans¬ 
lational  velocity  is  just  v.  If  u  =  10  m/sec  (=  36km/hrRi  22  mi/hr)  this  velocity  is  much  larger 
than  the  velocities  which  are  due  to  departures  of  the  vehicle  from  the  Darboux  frame,  which, 
as  we  have  just  seen,  are  on  the  order  of  O.lm/sec  or  less. 

The  rotational  velocity  of  the  Darboux  frame  is  vujd  =  ^(T^t-F/CnV-l-Kjs);  thus  the  magnitude 
of  the  rotational  velocity  is  section  we  will  estimate  bounds  on  r^. 

Km  and  Kg.  Our  analysis  is  based  on  the  analyses  in  [2,  17,  34]  and  on  the  highway  design 
recommendations  published  by  the  American  Association  of  State  Highway  Officials  [1]. 

Good  highway  design  allows  a  driver  to  make  turns  at  constant  angular  velocities,  and  to 
follow  spiral  arcs  in  transitioning  in  and  out  of  turns,  in  order  to  reduce  undesirable  acceleration 
effects  on  the  vehicle.  A  well-designed  highway  turn  has  also  a  transverse  slope,  with  the  outside 
higher  than  the  inside,  to  counterbalance  the  centrifugal  force  on  the  turning  vehicles.  Thus  the 
ideal  (smooth)  motion  of  a  vehicle  has  piecewise  constant  translational  and  rotational  velocity 
components,  with  smooth  transitions  between  them.  Note  that  the  translational  components 
are  constant  in  the  vehicle  coordinate  frame  even  when  the  vehicle  is  turning,  unless  it  slows 
down  to  make  the  turn. 

To  illustrate  the  typical  sizes  of  these  components,  consider  a  ground  vehicle  moving  with 
velocity  v  along  a  plane  curve  T  on  the  surface  S.  If  S  is  a  plane  and  T  is  a  circular  arc  with 
radius  of  curvature  pg  =  (i.e.,  the  vehicle  is  turning  with  a  constant  steering  angle),  the 
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angular  velocity  of  the  vehicle  is  vujd  =  VKgS  and  there  is  a  centripetal  acceleration  Oc  =  v^KgV 
at  the  vehicle’s  center  [33].  As  a  result  there  is  a  centrifugal  force  on  the  vehicle  proportional  to 
Ijadl  and  the  mass  of  the  vehicle.  If  skidding  is  to  be  avoided  the  limit  on  ||ac||  (see  [2])  is  given 
by 

llccll  =  v'^Kg  <  5r(tan  a  +  fia)  (8) 

where  g  is  the  gravitational  acceleration,  a  is  the  transverse  slope,  and  fXa  is  the  coefficient  of 
adhesion  between  the  wheels  and  the  surface.  [Typical  values  of  fia  range  from  0.8  —  0.9  for 
dry  asphalt  and  concrete  to  0.1  for  ice  (see  [34],  page  26).]  From  (8)  we  have  either  a  lower 
bound  on  pg  for  a  given  u  or  an  upper  bound  on  v  for  a  given  pg.  For  example,  if  u  =  30  m/sec 
(«  108 km/hr),  a  =  0.05 rad,  and  pa  =  0.2  from  v'^/pg  <  0.25^^  we  have  pg  >  367m.  This 
yields  an  upper  bound  on  the  yaw  angular  velocity  of  <  =  v/pg  «  0.08  rad/sec,  which 

is  somewhat  larger  than  the  yaw  angular  velocity  arising  from  the  departures  from  Darboux 
motion. 

Other  dynamic  constraints  on  a  vehicle  such  as  the  limits  on  torques  and  forces  can  be 
used  to  obtain  constraints  on  Tg  and  Kn~  (These  and  other  considerations  such  as  safety  and 
comfort  were  used  in  [1]  to  make  recommendations  for  highway  design;  for  a  summary  of  these 
recommendations  see  [15].)  For  both  vertical  curves  (crossing  a  hill)  and  turning  curves  the 
(recommended  lower  bound  on  the)  radius  of  curvature  pmm  grows  with  the  square  of  the  design 
velocity  vj,.  However,  the  resulting  (design)  yaw  and  pitch  angulax  velocities  are  limited  by 
t^d/Pmin-  Thus  for  Smaller  velocities  v  the  vehicle  can  negotiate  tighter  vertical  and  turning 
curves  and  thus  have  even  larger  values  of  the  yaw  and  pitch  angular  velocities.  Typical  values 
of  the  roll  and  pitch  angular  velocities  are  given  in  [15]. 

For  realistic  vehicle  speeds  we  can  conclude  the  following  about  the  impulsive  and  smooth 
translational  and  rotational  velocity  components  of  the  vehicle  [15].  The  impulsive  effects  on 
the  translational  velocity  are  approximately  two  orders  of  magnitude  smaller  than  the  smooth 
velocity  components  themselves.  Impulsive  effects  on  the  yaw  angular  velocity  are  somewhat 
smaller  than  the  smooth  yaw  component  arising  from  worst-case  turns  of  the  road;  for  moderate 
turns  the  impulsive  effects  are  comparable  in  size  to  the  smooth  yaw  velocity.  Impulsive  effects 
on  the  roll  angular  velocity  are  approximately  an  order  of  magnitude  larger  than  the  smooth 
roll  component  arising  from  worst-case  twists  (and  turns)  of  the  road;  for  gentler  twists  the 
smooth  roll  velocity  is  even  smaller.  Similarly,  impulsive  effects  on  the  pitch  angular  velocity 
are  approximately  an  order  of  magnitude  laxger  than  the  smooth  pitch  velocity  arising  from 
worst-case  changes  of  vertical  slope  (i.e.,  vertical  curves)  of  the  road;  for  gentler  vertical  curves 
the  smooth  pitch  angulax  is  even  smaller.  (The  impulsive  effects  are  not  significantly  affected 
by  turns,  twists,  or  vertical  slope.)  We  can  thus  conclude  that  impulsive  effects  on  the  roll  and 
pitch  angular  velocities  are  significant  and  larger  than  the  corresponding  smooth  velocities,  and 
that  impulsive  effects  on  the  yaw  angulax  velocity  are  on  the  order  of  the  smooth  yaw  velocity. 


2.4  Camera  Motion 

Assume  that  a  camera  is  mounted  on  the  vehicle;  let  dc  be  the  position  vector  of  the  mass  center 
of  the  vehicle  relative  to  the  nodal  point  of  the  camera  The  orientation  of  the  vehicle  coordinate 
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system  relative  to  the  camera  is  given  by  an  orthogonal  rotational  matrix  (a  matrix  of 

the  direction  cosines)  which  we  denote  by  Rc-  The  columns  of  Rc  are  the  unit  vectors  of  the 
vehicle  coordinate  system  expressed  in  the  camera  coordinate  system.  We  will  assume  that  the 
position  and  orientation  of  the  vehicle  relative  to  the  camera  coordinate  system  do  not  change 
as  the  vehicle  moves.  Thus  we  will  assume  that  R^  and  dc  are  constant  and  known. 

Given  the  position  of  a  scene  point  E  in  the  vehicle  coordinate  system  its  position 

Te  in  the  camera  coordinate  system  is  given  by 

^  “  Rc'Pq  d”  dc 

Since  Rc  and  dc  are  constant  we  have  rg  =  RcP^.  The  velocity  of  E  is  given  by 

re  =  —Lo  xrg-  f.  (9) 

In  this  expression,  the  rotational  velocity  is  d;  =  RcUJv  (see  (5)),  and  the  translational  velocity 
is  r  =  Rc%  +  uj  X  dc  (see  (4)),  where  Uy  and  %  are  the  rotational  and  translational  velocities  of 
the  vehicle  coordinate  system. 

We  saw  in  Section  2.3  that  both  ||ud;(i||  and  are  C*(0.1)rad/sec.  The  factors  Rc  and 

do  not  affect  the  magnitude  of  either  Hj  or  T.  Thus  the  two  components  of  rotational 
velocity  have  comparable  magnitudes. 

As  regards  the  translational  components,  note  that  for  normal  speeds  of  the  vehicle  (v  > 
10m/sec»  22mi/hr),  typical  suspension  elements,  and  the  camera  mounted  on  the  vehicle  close 

to  the  center  of  mass  we  have  (see  Section  2.3)  WdyjdW  =  (P(0.025)m/sec,  ||d^/<i||  =  C?(0.1)m/sec, 
and  ||dc||  =  C>(l)m/sec.  The  magnitudes  of  the  translational  velocity  components  are  thus 
Undid  X  dy/dW  <  u||did||  ||X/d||  =  0(0.0025)m/sec;  ||nt||  =  n  =  C>(10)m/sec;  and  \\uj  x  411  < 
||u;||  ||4||  =  C?(0.1)m/sec.  Therefore,  the  dominant  term  in  the  expression  for  T  is  nt  since  it  is 
two  orders  of  magnitude  larger  then  any  of  the  other  three  terms  of  T. 


2.5  Independently  Moving  Vehicles 

We  are  interested  in  other  vehicles  that  axe  moving  nearby.  We  assume  the  other  vehicles  are  all 
moving  in  the  same  direction.  To  facilitate  the  derivation  of  the  motion  equations  of  a  rigid  body 
B  we  use  two  rectangular  coordinate  frames,  one  (Oxyz)  fixed  in  space,  the  other  (CxiyiZi)  fixed 
in  the  bo<^  and  moving  with  it.  The  position  of  the  moving  frame  at  any  instant  is  given  by  the 
position  di  of  the  origin  Ci ,  and  by  the  nine  direction  cosines  of  the  axes  of  the  moving  frame 
with  respect  to  the  fixed  frame.  For  a  given  position  p  of  P  in  Cxiy^zi  we  have  the  position  rp 
of  P  in  Oxyz 

Tp  =  Rp-\-  dx  (10) 

where  R  is  the  matrix  of  the  direction  cosines  (the  frames  are  taken  as  right-handed  so  that 
det  i?  =  1).  The  velocity  of  P  in  Oxyz  is  given  by 

rp=uix  {rp  -  di)  -1-  4. 
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(11) 


where  di  is  the  translational  velocity  vector  and  wi  =  Uy  is  the  rotational  velocity 
vector. 

From  Section  2.3  we  have  for  a  typical  vehicle  ||a;i||  =  O(0.1)rad/sec  and  ||jp  —  di||  = 
0(l)m.  We  thus  have  ||a;i  x  (^  —  di)||  <  ||tDi||  \\rp  —  di||  =  0{Q.1)0(\)  =  (!?(0.1)m/sec.  For 
the  translational  velocity  we  have  |jdi||  =  v\  for  normal  speeds  of  the  vehicle  t;>10m/sec  « 
22m/sec  so  that  ||di||  =  (9(10)m/sec.  We  can  conclude  that  for  any  point  P  on  the  vehicle  the 
translational  velocity  is  two  orders  of  magnitude  larger  than  the  rotational  velocity. 

If  we  make  the  fixed  frame  Oxyz  correspond  to  the  camera  frame  at  time  t  we  have  from  (9) 
and  (11)  that  the  velocity  of  a  point  P  on  the  vehicle  expressed  in  the  camera  frame  is  given  by 

Tp  =  —u  X  Tp  ijji  X  {vp  —  di)  —  T-\-d\.  (12) 

In  (12)  the  vector  — T  +  di  corresponds  to  the  relative  translational  velocity  between  the  camera 
and  the  independently  moving  vehicle.  Regarding  the  first  and  the  second  terms  on  the  r.h.s. 
of  (12)  we  can  see  that  for  comparable  rotational  velocities  u  and  ijji  the  first  term  will  dominate 
the  second  term  since  usually  ||ii)  —  di||  <C  ||ip||-  We  will  use  this  observation  later. 


3  Image  Motion 


3.1  The  Imaging  Models 


Let  (Xj  y,  Z)  denote  the  Cartesian  coordinates  of  a  scene  point  with  respect  to  the  fixed  camera 
frame  (see  Figure  3),  and  let  {x,y)  denote  the  corresponding  coordinates  in  the  image  plane. 
The  equation  of  the  image  plane  is  Z  =  f,  where  /  is  the  focal  length  of  the  camera.  The 
perspective  projection  onto  this  plane  is  given  by 


(13) 


For  weak  perspective  projection  we  need  a  reference  point  {Xc,  Yc,Z,).  A  scene  point  {X,  Y,  Z) 
is  first  projected  onto  the  point  (X,  Y,  Zc)’,  then,  through  plane  perspective  projection,  the  point 
(X,  y,  Zc)  is  projected  onto  the  image  point  (x,  y).  The  projection  equations  are  then  given  by 

X  =  ^/,  y  =  (14) 


3.2  The  Image  Motion  Field  and  the  Optical  Flow  Field 

The  instantaneous  velocity  of  the  image  point  (x,  y)  under  perspective  projection  is  obtained  by 
taking  the  derivatives  of  (13)  and  using  (9): 

XZ-XZ  -Uf  +  xW  ,  xy 
Z2  -  z 


X*‘ 

7 


+  / )  +^2?/, 


X 


(15) 


Y 


Figure  3:  The  plane  perspective  projection  image  of  P  is  F  =  f{X/Z,YfZ,l)]  the  wealc  per¬ 
spective  projection  image  of  P  is  obtained  through  the  plane  perspective  projection  of  the 
intermediate  point  Pj  =  {X,  Y,  Zc)  and  is  given  by  G  =  f{XIZc,  YjZc,  1). 


y 


YZ  -  YZ  -Vf  4-  yW 
Z^  ~  Z 


+ 


—  LJ.X. 


(16) 


The  instantaneous  velocity  of  the  image  point  (x,  y)  under  weak  perspective  projection  can 
be  obtained  by  taking  derivatives  of  (14)  with  respect  to  time  and  using  (9): 


^XZ,-XZa 

X  =  f^i— 

^YZc-YZc 

V  = 


-Uf  -h  xW 
Zc 

-Vf+yW 

Zc 


- 


(17) 

(18) 


Let  r  and  f  be  the  unit  vectors  in  the  x  and  y  directions,  respectively;  f  =  xT  +  yf  is  the 
projected  motion  field  at  the  point  f  =  xr-|-  yf.  If  we  choose  a  unit  direction  vector  Ur  at  the 
image  point  r  and  call  it  the  normal  direction,  then  the  normal  motion  field  at  f  is  rj,  =  {f‘nr)nr. 
Ur  can  be  chosen  in  various  ways;  the  usual  choice  (as  we  shall  now  see)  is  the  direction  of  the 
image  intensity  gradient. 

Let  I{x,y,t)  be  the  image  intensity  function.  The  time  derivative  of  I  can  be  written  as 


dt 


dl  dx  dl  dy  dl 
dx  dt  dy  dt"^  dt 


lyj)  •  {xi  +  yj)  -f-  /(  =  VJ •  f  +  It 


where  VJ  is  the  image  gradient  and  the  subscripts  denote  partial  derivatives. 

If  we  assume  dlfdt  =  0,  i.e.  that  the  image  intensity  does  not  vary  with  time,  then  we  have 
V/  •  u  +  It  =  0.  The  vector  field  u  in  this  expression  is  called  the  optical  flow.  If  we  choose  the 
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(19) 


normal  direction  rtr  to  be  the  image  gradient  direction,  i.e.  Ur  =  V//||V/||,  we  then  have 


•u„  =  (u  •  nr)nr 


-ItVI 

IIVJIP 


where  Un  is  called  the  normal  flow. 


It  was  shown  in  [31]  that  the  magnitude  of  the  difference  between  and  the  normal  motion 
field  fn  is  inversely  proportional  to  the  magnitude  of  the  image  gradient.  Hence  Vn^Un  when 
II  V/||  is  large.  Equation  (19)  thus  provides  an  approximate  relationship  between  the  3-D  motion 
and  the  image  derivatives.  We  will  use  this  approximation  later  in  this  paper. 


3.3  Estimation  of  Rotation 


We  now  describe  our  algorithm  for  estimating  rotation.  In  this  section  we  give  only  a  brief 
description  of  the  algorithm.  A  full  description  and  a  proof  of  correctness  will  be  given  in  a 
forthcoming  paper.  We  shall  use  the  following  notation:  Let  I  be  the  image  intensity  at  f,  and 
let  fir  =  rixi  +  Uyf  =  V//||V/||  be  the  direction  of  the  image  intensity  gradient  at  f.  The  normal 
motion  field  at  r  is  the  projection  of  the  image  motion  field  onto  the  gradient  direction  Hr  and 
is  given  by  =  (f  •  nr)nr.  From  (15-16)  we  have 


Hn-Hr  =  nr:X  +  Uyy  =  —  [nr;{-U f  +  xW)  -t-  ny{-Vf  -f  yW)] 


„  J.r> 

Tlx  y  I  /  ^ 


W)x  - 


X 


nx^Y  +  fj+ny 


/, 


Wj/  -f  (uxy  —  nyx)ujz  (20) 


The  first  term  on  the  r.h.s.  of  (20)  is  the  translational  normal  motion  ft  •  Hr  and  the  remaining 
three  terms  are  the  rotational  normal  motion  r^  •  Hr.  From  now  on  we  will  assume  that  the 
camera  is  forward-looking,  i.e.  that  the  focus  of  expansion  (FOE)  is  in  the  image. 

The  normal  flow  at  r  is  defined  as  — /j;/||V/||.  From  [31]  we  know  that  the  magnitude  of  the 
difference  between  the  normal  flow  field  and  the  normal  motion  field  is  inversely  proportional 
to  the  gradient  magnitude;  we  can  thus  write 

fn  •  Hr  =  ft  •Hr+H^-Hr  =  0(||V/||-')  =  «n  '  ^ r  +  0(||  V/||-l).  (21) 


If  the  camera  motion  is  a  pure  translation  the  image  motion  field  is  a  radial  pattern;  the 
magnitude  of  each  image  motion  vector  is  proportional  to  the  distance  of  the  image  point  from 
the  focus  of  expansion  (FOEI)  and  inversely  proportional  to  the  depth  of  the  corresponding  scene 
point.  If  the  position  fo  =  ixq  +  jyo  of  the  FOE  is  known  the  translational  motion  field  can  be 
obtained  from  the  translational  normal  motion  field  by  multiplying  each  of  fj  •  by  a  vector 
whose  direction  is  (f  —  fo)  and  whose  magnitude  is  inversely  proportional  to  the  angle  between 
the  normal  flow  and  the  normal  motion  vector.  The  translational  motion  field  is  then  given  by 


rt  -  (ft  •  n,.) 


r  —  ro 


(f-fo) 


Ur 


(22) 
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Note  that  if  we  knew  Co  we  could  compute  the  rotational  motion  field  and  the  rotational 
normal  motion  ■  Ur  and  use  (21)  to  obtain 


rt‘  Ur  ^  u  •  rir  —  •  tir. 


If  we  combine  (22)  and  (23)  we  have 


{r-  To) 


^  i,  ^  \  {r-ro 
r*  «  u  •  n^.  -  To;  •  nJ  jz; — irr- 
'  [r- To)’ 


Ur 


(23) 


(24) 


If  the  FOE  is  known  or  can  be  estimated,  we  can  use  (24)  to  estimate  the  rotational  velocity 
vector  Co  by  minimizing  Indeed,  the  image  motion  field  in  the  neighborhood  of  the 

FOE  is  composed  of  the  translational  image  motion  and  the  rotational  image  motion.  The  roll 
component  of  the  rotational  image  motion  is  orthogonal  to  the  translational  image  motion,  so 
that  it  increases  the  magnitude  of  the  image  motion  field  and  the  normal  motion  field.  The 
yaw  and  the  pitch  components  of  the  rotational  image  motion  are  approximately  constant  in 
the  neighborhood  of  the  FOE  and  just  shift  the  position  of  the  singular  (zero)  point  of  the  fiow 
field  [14].  Furthermore,  the  rotational  normal  motion  accounts  for  most  of  the  image  motion 
field  at  the  distant  image  points  [15].  Therefore,  if  we  subtract  the  rotational  image  motion, 
the  sum  of  the  magnitudes  of  the  resulting  (translational)  flow  field  will  be  minimal.  Using  (20) 
and  (24)  we  then  have 


Co  =  arg min '^\\u  •  fir  - 


Ur 


-  ro) 


r 


(f-fo)  -n^lP' 


(25) 


In  matrix  form  this  problem  corresponds  to  minimizing  \\ACo  —  6||  (see  [29])  where  the  rows  a, 
of  A  are  given  by 


Oi 


||(f  -  fo)  •  n^l 


n 


V’ 


nxV  —  riyX 


(26) 


and  the  elements  bi  of  b  are  given  by 


The  solution  is  given  by 


bi  =  u  ■  fir 


ll(^-^o)ll 

l|(f-fo)-nrir 


uo  =  aH 


where  A"*"  is  the  generalized  inverse  of  A  (see  [29]). 


3.4  Estimating  the  FOE 

In  the  case  of  a  forward  looking  camera  and  an  unknown  FOE  it  is  possible  to  simultaneously 
estimate  the  FOE  and  Co.  Based  on  our  analysis  in  Section  2  we  observe  that,  in  the  case  of  a 
camera  rigidly  connected  to  the  vehicle  and  a  scene  without  independently  moving  objects,  the 
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direction  of  the  translational  velocity  of  the  vehicle  remains  approximately  constant  throughout 
the  image  sequence.  The  rotational  velocity  w,  however,  depends  on  both  the  geometry  of  the 
road  and  the  activity  of  the  suspension  elements;  hence,  Cj  changes  as  the  vehicle  moves.  It  is 
possible  to  choose  a  part  of  the  road  without  horizontal  and/or  vertical  curves  and/or  changes 
of  lateral  slope  so  that  the  vehicle  rotations  correspond  to  activities  of  the  suspension  elements. 
In  these  cases  the  rotational  velocity  components  change  in  an  almost  periodic  manner.  If  the 
scenes  are  chosen  so  that  significant  parts  of  images  correspond  to  distant  scene  points  the 
following  algorithm  can  be  used  to  reliably  estimate  the  FOE. 

Given  N  successive  image  frames  tahen  by  a  forward  looking  camera  mounted  on  a  moving 
vehicle  we  form  the  following  function 


JV  Nk 

(^  =  X)  S 

k=l  «=1 


I  (^-  -  To)  -  nr 


where  the  inside  sum  is  over  all  normal  flow  vectors  in  the  kth  frame,  and  the  outside  sum  is 
over  all  frames;  the  position  of  the  FOE  does  not  change  throughout  the  sequence  {N  frames), 
but  u  changes  at  every  frame  (hence  the  index  k).  We  estimate  uJk,k  =  1, ... ,7V,  and  fo  by 
minimizing  (p. 

A  straightforward  method  for  minimizing  (p  corresponds  to  a  nonlinear  least  squares  prob¬ 
lem.  It  can  be  observed  from  (27)  that  p  is  linear  in  the  cD^s  and  nonlinear  in  fo.  In  the 
numerical  analysis  literature  such  problems  are  called  “separable  nonlinear  least  squares  prob¬ 
lems”  [26].  Several  algorithms  for  solving  such  problems  have  been  proposed;  a  unifying  feature 
of  these  algorithms  is  that  they  have  better  performance  than  standard  nonlinear  least  squares 
algorithms  [26]  since  they  are  based  on  solving  problems  of  smaller  dimensionality  that  the 
unseparated  problem. 

In  this  paper  we  use  a  simple  version  of  the  separable  algorithm.  We  choose  an  initial  fo  and 
solve  for  the  ujk  vectors.  This  problem  is  equivalent  to  solving  TV  linear  least  squares  problems 
as  described  in  Section  3.3.  We  then  have  ujk  =  with  the  A^s  and  the  bkS  appropriately 
defined.  Once  the  UkS  are  estimated  we  have  a  nonlinear  least  squares  problem  in  the  two 
components  of  fo-  we  need  to  minimize  p  for  the  fixed  w^s.  From  (27)  we  then  have 


N  Nk 

fo  =  &TgminJ2J2oili] 

^0  A:=li=l  I 


Wi-roW 

{fi  -fo)  •  firf 


(28) 


where  ak,i  =  ||ui  •  Ur  —  f^k  •  ^r||-  We  use  the  Gauss-Newton  algorithm  to  solve  this  problem. 
The  partial  derivatives  dp/dxo  and  dpjdyo  that  are  required  by  the  Gauss-Newton  algorithm 
are  readily  obtained  from  (27-28). 

After  solving  for  ro  we  substitute  ro  in  (27)  and  solve  for  the  aJ^s;  we  then  use  these  values 
to  solve  for  fo  again.  The  process  is  repeated  until  fo  converges.  A  simple  test  for  convergence 
is  llro('S)  —  ro(s  +  1)||  <  1,  where  s  is  the  iteration  number. 
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3.5  The  Image  Motion  Field  of  an  Observed  Vehicle 


From  (17-18)  we  obtain  the  (approximate)  equations  of  projected  motion  for  points  on  a  vehicle 
under  weak  perspective: 

X  = 

y  = 

Equations  (29-30)  relate  the  image  (projected)  motion  field  to  the  scaled  translational  velocity 

z-'^f  =  z-\u  V  Wf. 


Uf-xW 

Zc  ’ 
Vf-yW 


(29) 

(30) 


Given  the  point  f  =  xi  +  yj  and  the  normal  direction  Uxt  +  Uyj,  from  (29-30)  the  normal 
motion  field  rn-n  =  UxX  +  Uyy  is  given  by 

i.fi  =  rixfUZ:^  -H  rtyfVZ;'^  -  [n^x  -f-  nyy)WZ:^  (31) 


Let 


Oi  \ 

(  rixf 

Cl  \ 

(  UZ;^  \ 

0-2  = 

Uyf 

,  C  = 

C2 

= 

vz:^ 

Oz  / 

-UxX  -  riyp  y 

V  C3  ) 

(32) 


Using  (32)  we  can  write  (31)  as  rn  ■  ft  =  a^c.  The  column  vector  a  is  formed  of  observable 
quantities  only,  while  each  element  of  the  column  vector  c  contains  qua,ntities  which  are  not 
directly  observable  from  the  images.  To  estimate  c  we  need  estimates  of  rji  •  n  at  three  or  more 
image  points. 


3.6  Estimating  Vehicle  Motion  from  Normal  Flow 


As  in  Section  3.5  we  use  linear  least  squares  to  estimate  parameter  vector  c  from  the  normal 
flow. 


In  the  case  of  a  moving  vehicle  the  parameters  of  interest  are  the  vehicle’s  trajectory  and  its 
rate  of  approach.  The  rate  of  approach 


W 


(measured  in  sec~^)  is  equivalent  to  the  inverse  of  the  time  to  collision  and  corresponds  to  the 
rate  with  which  an  object  is  approaching  the  camera  (or  receding  from  it).  The  rate  v  =  0.1/sec 
means  that  every  second  the  object  travels  0.1  of  the  distance  between  the  observer  and  its 
current  position.  A  negative  rate  of  approach  means  that  the  object  is  going  away  from  the 
camera. 


4  Experiments 

In  Sections  4.1  and  4.2  we  give  examples  illustrating  road  detection,  stabilization,  and  vehicle 
detection.  In  Section  4.3  we  present  results  for  several  sequences  showing  vehicles  in  motion. 
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4.1  Road  and  Vehicle  Detection 


We  detect  the  road  region  by  finding  road  edges  and  lane  markers.  A  Canny  edge  detector  is 
applied  and  Hough-like  voting  is  used  to  detect  dominant  straight  lines  in  the  image.  Each  line 
is  parameterized  by  its  normal  angle  a  and  its  displacement  d  from  the  center  of  the  image.  For 
all  possible  va.lues  of  a  and  d  the  image  is  scanned  along  the  corresponding  line.  The  number 
of  edge  points  that  are  found  within  a  strip  along  the  line,  and  whose  gradient  direction  is 
orthogonal  to  the  line  direction,  is  taken  as  the  vote  for  the  corresponding  point  in  the  a  -  d 
plane.  Among  those  lines,  only  the  ones  with  a  certain  weight  and  orientation  are  chosen  to  be 
road  line  candidates. 

If  several  lines  with  close  a  and  d  values  are  candidates,  only  the  best  representative  of  these 
lines  is  chosen.  The  other  lines  are  eliminated  by  applying  local  maximum  suppression  in  the  a 
-  d  plane.  Note  that  if  the  lines  represent  the  two  edges  of  a  lane  marker,  the  gradient  directions 
will  have  opposite  signs  for  the  two  edges. 

Due  to  perspectivity,  the  road  boundaries  and  lane  marker  lines  should  converge  to  a  single 
point.  Candidate  lines  that  do  not  converge  to  a  single  point  are  not  identified  as  a  road  or  lane 
boundaries.  The  identified  lines  are  the  maximal  subsets  of  candidate  lines  that  all  intersect  at 
or  close  to  one  point  (all  the  intersection  points  of  every  pair  of  the  lines  are  located  in  a  small 
region). 

Figure  4  presents  some  road  detection  and  vehicle  detection  results  for  four  different  se¬ 
quences  (collected  in  three  different  countries). 


4.2  Stabilization 

As  was  shown  in  [15]  the  impulsive  effects  introduce  significant  changes  in  the  roll  and  pitch 
angular  velocities  (see  Figure  5).  Figure  6  shows  three  examples  of  image  sequence  stabilization 
by  compensating  the  rotational  effects  of  road  bumps.  The  estimated  rotational  normal  flow 
component  (column  c)  is  subtracted  from  the  total  normal  flow  (column  b),  which  yields  the 
translational  normal  flow  component  (column  d)  One  can  see  that  the  translational  normal  flow 
components  at  distant  points  are  close  to  zero. 

The  vector  of  rotation  is  estimated  by  using  the  method  based  on  FOE  calculation,  as 
described  in  Section  3.3,  or  alternatively,  by  estimating  the  amount  of  rotation  from  the  apparent 
shifts  of  distant  points.  Two  examples  of  distant  point  identification,  using  horizon  points,  are 
shown  in  Figure  7. 


4.3  Relative  Motions  of  Vehicles 

After  derotating  and  detecting  moving  vehicles,  we  can  analyze  their  motions  using  the  algorithm 
for  motion  estimation  under  weak  perspective. 

In  the  first  experiment  we  used  an  image  sequence  taken  in  Haifa,  Israel,  from  a  vehicle 


16 


(a)  (b)  (c)  (d) 


Figure  4:  (a-d)  A  selected  frame  from  each  of  four  sequences.  Top:  the  input  images.  Middle: 
results  of  road  detection.  Bottom:  results  of  vehicle  detection. 

following  two  other  accelerating  vehicles.  The  sequence  consisted  of  90  frames  (slightly  less 
than  three  seconds).  Figure  8  shows  frames  0,  30  and  60,  and  the  corresponding  normal  flow 
on  the  vehicles.  Figure  9  shows  estimated  values  of  UZ~^,  and  WZ~^  for  the  central 

(closest)  vehicle.  These  values  correspond  to  the  translations  of  the  vehicles  relative  to  the 
vehicle  carrying  the  camera  (i.e.,  in  the  observer  coordinate  system).  Because  of  our  choice  of 
coordinate  system  the  rate  of  approach  i/  is  equal  to  the  negative  of  W Z~^ . 

The  graphs  show  that  the  motion  components  have  a  simple  behavior;  before  they  reach 
their  extremal  values  they  can  be  approximated  by  straight  lines,  indicating  constant  relative 
accelerations. 

In  the  second  experiment  we  used  an  image  sequence  of  a  van,  tahen  in  France,  from  another 
vehicle  following  the  van  [10,  16].  The  sequence  consisted  of  56  frames  (slightly  less  than  two 
seconds).  Figure  10  shows  frames  5,  15,  25,  and  35  as  well  as  the  corresponding  normal  flow. 
Figure  11  shows  estimated  values  of  UZ~^,  ajod  WZ~^.  The  graph  shows  that  there 

is  an  impending  collision  (rate  of  approach  greater  than  1  sec~^).  Around  the  20th  frame  the 
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Figure  5:  (a-b)  Two  images  taken  l/15th  of  a  second  apart;  (c-d)  their  normal  flows.  One  can 
see  the  effects  of  bumps.  In  the  first  frame,  the  flow  vectors  point  downward;  in  the  second, 
they  point  upward. 

rate  of  approach  becomes  zero  (as  do  all  the  velocity  components)  and  after  that  it  becomes 
negative  because  the  van  starts  pulling  away  from  the  vehicle  carrying  the  camera.  A  similar 
image  sequence  was  used  in  [10]  in  studies  of  vehicle  convoy  behavior. 

The  third  sequence  (taken  from  the  lEN  Galileo  Ferrari  Vision  Image  Library)  consisted 
of  26  frames.  Figure  12  shows  frames  1,  14  and  26,  as  well  as  the  corresponding  normal  flow. 
Figure  13  shows  estimated  values  of  UZ~^^  and  WZ~^.  The  graph  shows  that  the  W 

component  of  the  translational  velocity  is  dominant  over  the  U  and  V  components,  which  is 
correct  for  a  vehicle  that  overtakes  the  observer  vehicle  and  does  not  change  lanes;  the  two 
vehicles  are  moving  on  parallel  courses. 

Figure  14  shows  frames  1,  26  and  47  of  another  48-frame  sequence,  taken  in  Haifa,  Israel;  as 
well  as  the  corresponding  normal  flow.  Figure  15  shows  UZ~^,  VZ~^,  and  WZ~^  graphs  for  the 
left  (overtaking)  and  central  vehicles.  One  can  see  that  the  graphs  differ  mainly  in  the  values  of 
the  W  component,  since  the  relative  speed  of  approach  for  the  left  vehicle  is  greater  than  that 
for  the  central  one.  The  U  and  V  components  are  relatively  small;  all  three  vehicles  are  moving 
in  the  same  direction. 


5  Related  Work 

5.1  Understanding  Object  Motion 

Some  work  has  been  done  on  understanding  object  motion,  but  this  work  has  almost  always 
assumed  a  stationary  viewpoint.  Understanding  object  motion  is  based  on  extracting  the  object’s 
motion  parameters  from  an  image  sequence.  Broida  and  Chellappa  [6]  proposed  a  framework 
for  motion  estimation  of  a  vehicle  using  Kalman  filtering.  Weng  et  al.  [32]  assumed  an  object 
that  possesses  an  axis  of  symmetry,  and  a  constant  angular  momentum  model  which  constrained 
the  motion  over  a  local  frame  subsequence  to  be  a  superposition  of  precession  and  translation. 
The  trajectory  of  the  center  of  rotation  can  be  approximated  by  a  vector  polynomial.  Changing 
the  parameters  of  the  model  with  time  allows  adaptaition  to  long-term  changes  in  the  motion 
characteristics. 
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Figure  6:  Stabilization  results  for  one  frame  from  each  of  three  sequences:  (a)  Input  frame,  (b) 
Normal  flow,  (c)  Rotational  normal  flow,  (d)  Translational  normal  flow 

In  [16]  Duric  et  al.  tried  to  understand  the  motions  of  objects  such  as  tools  and  vehicles, 
based  on  the  fact  that  the  natural  axes  of  the  object  tend  to  remain  aligned  with  the  local 
trihedron  defined  by  the  object’s  trajectory.  Based  on  this  observation  they  used  the  Frenet- 
Serret  motion  model,  and  showed  that  knowing  how  the  Frenet-Serret  frame  is  changing  relative 
to  the  observer  gives  essential  information  for  understanding  the  object’s  motion.  Our  present 
work  is  a  continuation  of  this  work  in  a  more  realistic  and  complicated  scenario,  in  which  the 
camera  is  also  moving. 


5.2  Independent  Motion  Detection 

Much  work  has  been  done  on  detection  of  independently  moving  objects  by  a  moving  observer. 
However,  the  work  has  been  related  to  detection,  classification,  and  tracking  of  the  motion,  and 
has  not  paid  much  attention  to  motion  estimation.  Clarke  and  Zisserman  in  [11]  addressed  the 
problem  of  independently  moving  rigid  object  detection,  assuming  that  all  motions  (including 
the  camera  motion)  are  pure  translations.  The  idea  is  to  track  a  set  of  feature  points  using 
correlation  and  movement  assumptions  derived  from  the  previous  frames,  and,  based  on  the 
feature  point  correspondences,  determine  the  epipole  (FOE)  for  the  background  points  as  the 


19 


intersection  of  the  features’  image  plane  trajectories.  The  assumption  is  that  a  majority  of  the 
feature  points  are  background  points.  Moving  objects  are  found  by  fitting  an  epipole  to  those 
feature  matches  that  are  not  consistent  with  the  background  epipole.  The  image  plane  extent  of 
the  moving  object  is  defined  by  the  smallest  rectangle  enclosing  these  features.  The  instability 
of  the  camera  introduces  strong  rotational  components  into  the  relative  motion;  these  are  not 
dealt  with  in  [11]. 

Torr  and  Murray  in  [30]  used  statistical  methods  to  detect  a  non-rigid  motion.  A  five¬ 
dimensional  space  of  image  pixels  and  spatio-temporal  intensity  gradients  is  fit  to  an  affine 
transformation  model  in  a  least  squares  sense,  while  identifying  the  outliers  to  the  fit  using 
statistical  diagnostics.  The  outliers  are  spatially  clustered  to  form  sets  of  pixels  representing 
independently  moving  objects.  The  assumption  again  is  that  the  majority  of  the  pixels  come 
from  background  points  and  their  movement  is  well  approximated  by  a  linear  or  affine  vector 
field,  i.e.  the  distances  to  the  background  points  are  large  compared  to  the  variations  in  these 
distances. 

Nelson  in  [23]  suggested  two  qualitative  methods  for  motion  detection.  The  first  uses  knowl¬ 
edge  about  observer  motion  to  detect  independently  moving  objects  by  looking  for  points  whose 
projected  velocity  behaviors  are  not  consistent  with  the  constraints  imposed  by  the  observer’s 
motion.  The  second  method  is  used  to  detect  so-called  “animate  motion”,  which  can  be  found 
by  looking  for  violations  of  the  motion  field  smoothness  over  time.  This  is  valid  in  cases  where 
the  observer  motion  changes  slowly,  while  the  apparent  motion  of  the  moving  object  changes 
rapidly. 

Irani  et  al.  in  [20]  used  temporal  integration  to  construct  a  dynamic  internal  representation 
image  of  the  tracked  object.  It  is  assumed  that  the  motion  can  be  approximated  by  2D  paxa- 
metric  transformations  in  the  image  plane.  Given  a  pair  of  images,  a  dominant  motion  is  first 
computed  and  the  corresponding  object  is  excluded  from  the  region  of  analysis.  This  process 
is  then  repeated  and  other  objects  are  found.  To  track  the  objects  in  a  long  image  sequence, 
integrated  images  registered  with  respect  to  the  tracked  motion  are  used. 
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Figure  8:  Frames  0,  30,  and  60  of  a  sequence  showing  two  vehicles  accelerating.  The  normal 
flow  results  are  shown  below  the  corresponding  image  frames. 

Sharma  and  Aloimonos  in  [28]  demonstrated  a  method  for  detecting  independent  motions 
of  compact  objects  by  an  active  observer  whose  motion  can  be  constrained.  Fejes  and  Davis 
in  [18]  used  constraints  on  low-dimensional  projected  components  of  flow  fields  to  detect  inde¬ 
pendent  motion.  They  implemented  a  recursive  filter  to  extract  directional  motion  parameters. 
Independent  motion  detection  was  done  by  a  combination  of  repeated-median-based  line-fitting 
and  one-dimensional  search.  The  method  was  demonstrated  on  scenes  with  large  amounts  of 
clutter. 

5.3  Vehicle  Detection  and  Tracking 

In  previous  work  on  vehicle  detection  and  tracking  by  a  moving  observer,  the  detection  made 
use  of  model-based  object  recognition  in  single  frames.  Gil  et  al.  [19]  combined  multiple  mo¬ 
tion  estimators  for  vehicle  tracking.  Vehicle  detection  was  performed  using  two  features:  the 
bounding  rectangle  of  the  moving  vehicle,  where  the  convex  hull  of  the  vehicle  is  computed  for 
every  frame  and  then  translated  according  to  the  predicted  motion  parameters,  and  an  updated 
2-D  pattern  (gray-level  mask)  based  on  optimization  of  the  correlation  between  the  pattern  and 


21 


Figure  9:  Motion  analysis  results  for  the  acceleration  sequence.  U,  V,  W  are  the  scaled  (by  an 
unknown  distance  Z~^)  components  of  the  relative  translational  velocity  . 

the  image  using  the  motion  parameters.  These  results  were  obtained  using  a  stationary  camera 
mounted  above  a  highway  under  different  road  and  illumination  conditions. 

Betke  et  al.  [3]  developed  a  real-time  system  for  detection  and  tracking  of  multiple  vehicles  in 
a  frame  sequence  taken  on  a  highway  from  a  moving  vehicle.  The  system  distinguishes  between 
distant  and  passing  vehicle  detection.  In  case  of  a  passing  vehicle  the  recognition  is  performed 
by  detecting  large  brightness  differences  over  small  numbers  of  frames.  2-D  car  models  are  used 
to  create  a  gray-scale  template  of  the  detected  vehicle  for  future  tracking.  Distant  vehicles  are 
detected  by  analyzing  prominent  horizontal  and  vertical  edges.  A  square  region  bounded  by  such 
edges,  which  is  strongly  enough  correlated  with  a  vehicle  template,  is  recognized  as  a  vehicle.  For 
each  newly  recognized  vehicle  a  separate  tracking  process  is  allocated  by  a  real-time  operating 
system,  which  tracks  the  vehicle  until  it  disappears  and  makes  sure  that  no  other  process  tracks 
the  sarne  vehicle.  When  one  vehicle  occludes  another,  one  of  the  tracking  processes  terminates 
and  the  other  tracks  the  occlusion  region  as  a  single  moving  object. 


5.4  Road  Detection  and  Shape  Analysis 

Our  work  also  involved  detection  of  the  road  markings;  in  doing  this  we  made  no  significant 
use  of  motion  models.  There  has  been  considerable  work  on  road  boundary  detection  in  images 
obtained  by  a  vehicle  driving  on  the  road.  A  typical  detection  procedure  involves  two  steps. 
First  the  road  is  detected  in  a  single  frame  (or  in  a  small  number  of  frames),  and  then  the  road 
boundaries  are  tracked  in  a  long  sequence  of  frames. 

Schneiderman  and  Nashman  in  [27]  left  the  first  step  to  a  human  operator  and  gave  a  solution 
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Figure  10:  Frames  5,  15,  25,  and  35  of  the  van  sequence.  The  normal  flow  results  are  shown 
below  the  corresponding  image  frames. 

for  the  second  step  alone.  Their  algorithm  is  based  on  road  markings.  In  each  frame,  edge  points 
are  extracted  and  grouped  into  clusters  representing  individual  lane  markers.  The  lane  marker 
models  built  in  the  first  step  are  updated  using  the  information  obtained  from  successive  frames. 
The  markers  are  modeled  by  second-order  polynomials  in  the  image  plane.  Tracking  is  exploited 
in  the  sense  that  edge  points  are  associated  with  markers  based  on  a  road  model  formed  from 
analysis  of  the  previous  frames.  Spatial  proximity  and  gradient  direction  are  used  as  a  clues  for 
clustering.  The  marker  models  are  updated  to  satisfy  a  least  squares  criterion  of  optimality. 

Broggi  in  [7]-[9]  presented  a  parallel  real-time  algorithm  for  road  boundary  detection.  First 
the  image  undergoes  an  inverse  perspective  transformation  and  then  road  markings  are  detected 
using  simple  morphological  filters  that  work  in  parallel  on  different  image  areas.  The  inverse 
perspective  procedure  is  based  on  a  priori  knowledge  of  the  imaging  process  (camera  position, 
orientation,  optics,  etc.). 

Richter  and  Wetzel  in  [25]  used  region  segmentation  for  road  surface  recognition.  A  road 
model  is  adjusted  to  the  segmentation  results  collected  from  several  frames.  The  adjusted  model 
is  then  used  for  model-based  segmentation  and  tracking.  A  special  lookup  mechanism  is  used 
to  overcome  the  problem  of  obstacles  and  shadows  that  may  cause  the  segmentation  algorithm 
to  fail  to  detect  the  road  regions. 

A  sophisticated  road  model  that  takes  into  account  both  horizontal  and  vertical  curvature 
was  suggested  by  Dickmanns  in  [12].  A  differential-geometric  road  representation  and  a  spatio- 
temporal  driving  model  are  exploited  to  find  the  parameters  of  the  road  and  the  vehicle. 


Figure  11:  Motion  analysis  results  for  the  van  sequence.  F,  W  are  the  scaled  (by  an  unknown 
distance  Z~^)  components  of  the  relative  translational  velocity  . 

6  Conclusions  and  Plans  for  Future  Work 


Understanding  the  motions  of  vehicles  from  images  taken  by  a  moving  observer  requires  a 
mathematical  formulation  of  the  relationships  between  the  observer’s  motion  and  the  image 
motion  field,  as  well  as  a  model  for  the  other  vehicles’  trajectories  and  their  contributions  to  the 
image  motion  field.  In  this  paper  a  constant  relationship  between  each  vehicle’s  frame  and  the 
observer’s  frame  is  assumed.  The  use  of  the  Darboux  ferame  provides  a  vocabulary  appropriate 
for  describing  long  motion  sequences. 

We  have  derived  equations  for  understanding  the  relative  motions  of  vehicles  in  traffic  scenes 
from  a  sequence  of  images  taken  by  a  moving  vehicle.  We  use  the  Darboux  motion  model  for 
both  the  observing  vehicle  and  the  nearby  moving  vehicles.  Using  a  full  perspective  imaging 
model  we  stabilize  the  observer  sequence  so  that  our  model  for  the  observed  vehicles’  motions 
can  be  applied.  Using  the  weak  perspective  approximation  we  analyze  the  nearby  vehicles’ 
motions  and  apply  this  analysis  to  long  image  sequences.  Expanding  our  analysis  to  various 
classes  of  traflac  events  [21] ,  and  to  articulated  vehicles,  are  directions  for  future  research. 
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